101 research outputs found

    A latent model for ad hoc table retrieval

    Get PDF
    © Springer Nature Switzerland AG 2020. The ad hoc table retrieval task is concerned with satisfying a query with a ranked list of tables. While there are strong baselines in the literature that exploit learning to rank and semantic matching techniques, there are still a set of hard queries that are difficult for these baseline methods to address. We find that such hard queries are those whose constituting tokens (i.e., terms or entities) are not fully or partially observed in the relevant tables. We focus on proposing a latent factor model to address such hard queries. Our proposed model factorizes the token-table co-occurrence matrix into two low dimensional latent factor matrices that can be used for measuring table and query similarity even if no shared tokens exist between them. We find that the variation of our proposed model that considers keywords provides statistically significant improvement over three strong baselines in terms of NDCG and ERR

    Identifying Major Tasks from On-line Reviews

    Get PDF
    © 2017 The Authors. Published by Elsevier B.V. Many e-commerce websites allow customers to provide reviews that reflect their experiences and opinions about the business\u27s products or services. Such published reviews potentially benefit the business\u27s reputation, improve both current and future customers\u27 trust in the business, and accordingly improve the business. Negative reviews can inform the merchant of issues that, when addressed, also improve the business. However, when reviews reflect negative experiences and the merchant fails to respond, the business faces potential loss of reputation, trust, and damage. We present the Sentiminder system that identifies reviews with negative sentiment, organizes them, and helps the merchant develop a plan with an end date by which issues will be addressed. In this paper we address the problem of quickly finding subtasks in a large set of reviews, which may help the merchant to identify, from the set of reviews, subtasks that need to be addressed. We do this by identify nouns that frequently occur only in the reviews with negative sentiment

    Relevance-based entity selection for ad hoc retrieval

    Get PDF
    © 2019 Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the NDCG@20, and ERR@20 metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities

    Temperature Forecasts with Stable Accuracy in a Smart Home

    Get PDF
    © 2016 The Authors. We forecast internal temperature in a home with sensors, modeled as a linear function of recent sensor values. When delivering forecasts as a service, two desirable properties are that forecasts have stable accuracy over a variety of forecast horizons - so service levels can be predicted - and that the forecasts rely on a modest amount of sensor history - so forecasting can be restarted soon after any data outage due to, for example, sensor failure. From a publicly available data set, we show that sensor values over the past one or two hours are sufficient to meet these demands. A standard machine learning method based on forward stepwise linear regression with cross validation gives forecasts whose out-of-sample errors increase slowly as the forecast horizon increases, and that are accurate to within one fifth of a degree C over three hours, and to within about one half degree C over six hours, based on one or two hours of history. Previous results from this data achieved errors within one degree C over three hours based on five days of history

    Cloud-based Lineament Extraction of Topographic Lineaments from NASA Shuttle Radar Topography Mission Data

    Get PDF
    © 2016 The Authors. This paper presents initial results of a JavaTM based, feature extraction tool, which represents a standard implementation of a hill-shading algorithm that transforms a 2D image to pseudo 3D image to enhance edge contrast in combination with an edge detection Canny algorithm that performs segmentation to produce multidirectional sun-shaded images and their edges. Our goal is to firstly automate this processes in JavaTM to obtain multidirectional optimization of edge discovery and secondly scale this algorithm to the complete SRTM raster collection at multiple pixel resolutions to document the distribution of Earth topographic discontinuities from continental to regional and local scales, respectively on the order of 1000s, 100s and 10s of kilometers. This tool will support the automatic extraction of lineaments of the transformed images to predict the existence of linear features that can be often found in association with ore deposits and landslides, if they represent tectonic lineaments. The collection of processed big data, represents a multi-scale data repository that may find use for these and other geological and environmental applications. We present preliminary outputs from a case study conducted in the Flin-Flon greenstone belt in Canada, which is well known for its base-metal endowment. In this study, two main shaded relief images with multidirectional illumination were created in Java each with four azimuth angles of the light sources and from which our developed tool extracts automatically multiple lineaments. The extracted lineaments represent both positive and negative elevation breaks, due to sudden slope inversions identifying dominantly crest lines and valleys. Preliminary results show good agreement with drainage networks, mapped fault lines and orientations of structures measured in the field. The main trends of the extracted lineaments of both images are NW-SE, N-S, E-W and NE-SW

    Identifying major tasks and minor tasks within online reviews

    Get PDF
    © 2017 Elsevier B.V. Many e-commerce websites allow customers to provide reviews that reflect their experiences and opinions about products and services. Such published reviews, whether positive or negative, serve both the consumer and the business. Negative reviews can inform the merchant of issues that, when addressed, may improve the addressed aspect of the business and improve its online reputation. However, when the merchant fails to respond to customers’ concerns, the business faces potential loss of reputation. The Sentiminder system identifies major areas of customer concern, and specific concerns within each area. This helps the merchant to process a large body of reviews and find what needs to be addressed. In this paper we address the problems of quickly finding specific issues and specific comments that are consistently discussed in a negative way. Our technique drills down from the major task areas to more specific issues, assisting the user to accurately determine what issues need attention. The sentiment of reviews on the same topic can vary widely, so we maximize coherence over a variety of six different sentiment assessment techniques. We achieve from about 45% to 65% coherence. These suggestions are implemented in the Sentiminder, an online tool that creates schedules of optimal selections of tasks

    Selecting Sensors when Forecasting Temperature in Smart Buildings

    Get PDF
    © 2017 The Authors. Published by Elsevier B.V. Forecasts of temperature in a smart building, i.e. one that is outfitted with sensors, are computed from data gathered by these sensors. Model predictive controllers can use accurate temperature forecasts to save energy by optimally using Heating, Ventilation and Air Conditioners while achieving comfort. We report on experiments from such a house, in which we select different sets of sensors, build a temperature model from each set, and then compare the accuracy of these models. While a primary goal of this research area is to reduce costs by reducing energy consumption, in this paper, besides the cost of energy, we consider the cost of data collection and management. Each sensor employed in the forecast calculation incurs costs for installation and maintenance and an incremental cost for computation. Some sensors, however, may contribute little or no improvement to the forecast accuracy. We incrementally construct sets of sensors until we arrive at a set for which no superset produces a better forecast. Then we construct a successive series of subsets, such that forecast accuracy degrades slowly. As each sensor is removed, on the one hand, the forecast error increases, so the energy costs may increase for a given controller. On the other hand, the costs for installing sensors and for computing models are reduced. By considering this tradeoff over the the series of sets, an optimal set of sensors can be found to be used with that controller

    Accurately forecasting temperatures in smart buildings using fewer sensors

    Get PDF
    © 2017, Springer-Verlag London Ltd., part of Springer Nature. Forecasts of temperature in a “smart” building, i.e. one that is outfitted with sensors, are computed from data gathered by these sensors. Model predictive controllers can use accurate temperature forecasts to save energy by optimally using heating, ventilation and air conditioners while achieving comfort. We report on experiments from such a house. We select different sets of sensors, build a temperature model from each set, and compare the accuracy of these models. While a primary goal of this research area is to reduce energy consumption, in this paper, besides the cost of energy, we consider the cost of data collection and management. Our approach informs the selection of an optimal set of sensors for any model predictive controller to reduce overall costs, using any forecasting methodology. We use lasso regression with lagged observations, which compares favourably to previous methods using the same data

    Impact of document representation on neural ad hoc retrieval

    Get PDF
    © 2018 Association for Computing Machinery. Neural embeddings have been effectively integrated into information retrieval tasks including ad hoc retrieval. One of the benefits of neural embeddings is they allow for the calculation of the similarity between queries and documents through vector similarity calculation methods. While such methods have been effective for document matching, they have an inherent bias towards documents that are sized relatively similarly. Therefore, the difference between the query and document lengths, referred to as the query-document size imbalance problem, becomes an issue when incorporating neural embeddings and their associated similarity calculation models into the ad hoc document retrieval process. In this paper, we propose that document representation methods need to be used to address the size imbalance problem and empirically show their impact on the performance of neural embedding-based ad hoc retrieval. In addition, we explore several types of document representation methods and investigate their impact on the retrieval process. We conduct our experiments on three widely used standard corpora, namely Clueweb09B, Clueweb12B and Robust04 and their associated topics. Summarily, we find that document representation methods are able to effectively address the query-document size imbalance problem and significantly improve the performance of neural ad hoc retrieval. In addition, we find that a document representation method based on a simple term-frequency shows significantly better performance compared to more sophisticated representation methods such as neural composition and aspect-based methods

    A Refinement of Lasso Regression Applied to Temperature Forecasting

    Get PDF
    © 2018 The Authors. Published by Elsevier B.V. Model predictive controllers use accurate temperature forecasts to save energy by optimally controlling heating, ventilation and air conditioning equipment while achieving comfort for occupants. In a smart building, i.e. one that is outfitted with sensors, temperature forecasts are computed from data gathered by these sensors. Recently, accurate temperature forecasts have been generated using relatively few observations from each sensor. However, long sensor histories are available in smart houses. In this paper we consider improving forecast accuracy by using up to 24 hours of quarter-hourly readings. In particular, we overcome forecast inaccuracy that arises from the one standard error heuristic (1SE) in lasso regression. When there are many historical observations, low variance in the error estimations can result in excessively high values for the lasso hyperparameter λ. We propose the midfel refinement of lasso regression, which adjusts λ based on the shape of the error curve, resulting in improved forecast accuracy. We illustrate its effect in a setting where lasso regression is used to select sensors based on forecast accuracy. In this setting, midfel lasso regression using many historical observations has two effects: its improves accuracy and uses fewer sensors. Thus it potentially reduces costs arising both from energy usage and from sensor installation
    corecore